Scientific Data
○ Springer Science and Business Media LLC
Preprints posted in the last 7 days, ranked by how well they match Scientific Data's content profile, based on 174 papers previously published here. The average preprint has a 0.11% match score for this journal, so anything above that is already an above-average fit.
Wolters, F. C.; Woldu Semere, T.; Schranz, M. E.; Medema, M. H.; Bouwmeester, K.; van der Hooft, J. J. J.
Show abstract
Plants produce the most diverse blends of specialized metabolites on earth. Natural products derived from plants are valuable resources for drug development, food chemistry, and crop resistance breeding. Phenotypes of specialized metabolite profiles can be captured by untargeted mass-spectrometry across species phylogeny, tissues, and genotypes. Here, we collected metabolic fingerprints of 17 Brassicaceae species across three tissues (paired leaf and root; flower) using liquid chromatography-tandem mass spectrometry (LC-MS/MS) in positive and negative ionization mode. Corresponding metadata has been refined for reuse according to ReDU guidelines, and for integration with public genomic and transcriptomic data. Standardization of in vitro growth conditions, and data processing workflows enables integration of acquired raw and processed data across platforms for single- and multi-omics analysis. Further, the inclusion of tissue-specific metabolic profiles across ploidy levels, as well as across crop species and wild relatives, makes this dataset a valuable resource for natural product discovery.
Madan, R.; Crane, P. K.; Gennari, J. H.; Latimer, C. S.; Choi, S.-E.; Grabowski, T. J.; Mac Donald, C. L.; Hunt, D.; Postupna, N.; Bajwa, T.; Webster, J.
Show abstract
1.Quantitative neuropathology has advanced through whole-slide imaging and digital histology platforms. Yet, these measurements rarely align with neuroimaging coordinate frameworks that may be useful for spatial modeling and other applications. QNPtoVox, short for quantitative neuropathology to voxels, is a reproducible, modular pipeline that transforms quantitative metrics generated by digital pathology software (HALO) into voxel-based maps registered to a standard common coordinate (MNI) template. The workflow integrates digital histopathology, gross tissue photography, ex-vivo MRI, and nonlinear registration to generate spatially standardized 3D pathology representations. This Methods article provides a complete procedural description, including required materials, step-wise instructions, operator-dependent checkpoints, expected outputs, reproducibility evaluation, and troubleshooting. QNPtoVox enables voxel-level integration of neuropathology with neuroimaging tools, unlocking existing histopathology datasets for computational modeling and cross-cohort harmonization.
Al-Naji, A.; Schubotz, R. I.; Zahedi, A.
Show abstract
Research in cognitive neuroscience has relied on simple, highly controlled stimuli due to the difficulty in developing standardized, ecologically valid stimulus sets. However, there is a consensus that using ecologically valid stimuli is imperative to generalize results beyond controlled laboratory settings. The current study introduces a naturalistic audio stimulus database, consisting of short, recognizable, and emotionally rated stimuli. To create such a database, the current study collected 291 audio files from a wide range of sources. 361 participants rated the audio clips on emotionality, arousal, and recognizability, and subsequently freely described the audios by typing what they believed the sound to be. The text responses of the participants were embedded and clustered using an unsupervised machine-learning algorithm to derive a participant-grounded organization of auditory object categories. The results indicate audio clips were easily recognizable, while emotionality and arousal ratings showed broad variability, making the database suitable for diverse experimental needs. Furthermore, the final database comprises 10 distinct semantic categories, providing a diverse set of auditory stimuli.
HOUEGNIGAN, L.; Cuesta Lazaro, E.
Show abstract
Increasing human activities along the US west coast are of concern for populations of cetaceans and particularly for a number of large whale species that are recovering from overexploitation during the era of commercial whaling. New rapid monitoring tools, such as satellite imagery analysis powered by recent advances in artificial intelligence, have potential to provide additional broad-scale and near real-time capacities for survey and monitoring. This paper investigates and demonstrates the feasibility of automatic detection of gray whales in sub-meter satellite imagery off the coast of California, USA. Observations and statistical analysis of regional imagery allowed not only an assessment of their detectability but also the development of robust signal processing and machine learning-based solutions for automated detection. To that end, a regional dataset of 221 gray whales was created using signal processing to inform a deep-learning-based detection framework, and 20 different large neural network architectures for feature extraction followed by a support vector machine algorithm for classification were evaluated for their detection performance. Neural network backbones included 19 convolutional neural networks and 1 transformer network. The best architecture generally achieved satisfying performance with an average balanced accuracy reaching up to 99.90%. It is also demonstrated that panchromatic imagery, in spite of the lesser amount of information provided, can be used to perform detection with a relatively high accuracy of 87.05%, allowing wider spatial and temporal coverage. Large-scale deployment of the best performing models over a broad range of regional satellite imagery resulted in the detection of 3353 gray whales, as well as opportunistic detections of humpback, blue and fin whales, in and going from December 28th 2009 to March 26th 2023. It also provided meaningful data points concerning the migration routes of gray whales within the Channel Islands and Southern California Bight. The large number of high-confidence detections indicates the capacity for a large-scale monitoring approach to support state and federal conservation policies such as gear mitigation, vessel speed reduction programs, or shipping lane redefinition that could also be expanded to other areas and for other species.
Meisler, S. L.; Cieslak, M.; Bagautdinova, J.; Hendrickson, T. J.; Pandhi, T.; Chen, A. A.; Hillman, N.; Radhakrishnan, H.; Salo, T.; Feczko, E.; Weldon, K. B.; McCollum, r.; Fayzullobekova, B.; Moore, L. A.; Sisk, L.; Davatzikos, C.; Huang, H.; Avelar-Pereira, B.; Caffarra, S.; Chang, K.; Cook, P. A.; Flook, E. A.; Gomez, T.; Grotheer, M.; Hagen, M. P.; Huque, Z. M.; Karipidis, I. I.; Keller, A. S.; Kruper, J.; Luo, A. C.; Macedo, B.; Mehta, K.; Mitchell, J. L.; Pines, A. R.; Pritschet, L.; Rauland, A.; Roy, E.; Sevchik, B. L.; Shafiei, G.; Singleton, S. P.; Stone, H. L.; Sun, K. Y.; Sydnor,
Show abstract
The Adolescent Brain Cognitive Development (ABCD) Study is the largest U.S.-based neuroimaging initiative of adolescent brain maturation. Diffusion MRI (dMRI) provides unique insights into white matter organization, yet applying advanced processing pipelines and managing technical variability across scanning environments remains challenging at scale. To address these issues, we present ABCD-BIDS Community Collection (ABCC) release 3.1.0, including a curated resource of more than 24,000 fully processed ABCD dMRI datasets. ABCC provides fully processed images, nuanced image quality metrics, advanced microstructural measures, and person-specific bundle tractography. Evaluating these rich data revealed that measures of diffusion restriction and non-Gaussianity--in particular the intracellular volume fraction from NODDI and return-to-origin probability from MAP-MRI--were highly sensitive to neurodevelopment and robust to variation in image quality. Additionally, harmonization of microstructural features markedly improved the cross-vendor generalizability of developmental effects. Together, ABCC accelerates reproducible, rigorous research on adolescent white matter development.
Ramirez-Torano, F.; Hatlestad-Hall, C.; Drews, A.; Renvall, H.; Rossini, P. M.; Marra, C.; Haraldsen, I. H.; Maestu, F.; Bruna, R.
Show abstract
Electroencephalography (EEG) preprocessing is a critical yet time-consuming step that often relies on expert-driven, semi-automatic pipelines, limiting scalability and reproducibility across large datasets. In this work, we present sEEGnal, a fully automated and modular pipeline for EEG preprocessing designed to produce outputs comparable to expert-driven analyses while ensuring consistency and computational efficiency. The pipeline integrates three main modules: data standardization following the EEG extension of the Brain Imaging Data Structure (BIDS), bad channel detection, and artifact identification, combining physiologically grounded criteria with independent component analysis and ICLabel-based classification. Performance was evaluated against manual preprocessing performed by EEG experts at two complementary levels: preprocessing metadata (bad channels, artifact duration, and rejected components) and EEG-derived measures. In addition, test-retest analyses were conducted to assess the stability of the pipeline across repeated recordings. Results show that sEEGnal achieves performance comparable to expert-driven preprocessing while preserving key neurophysiological features. Furthermore, the pipeline demonstrates reduced variability and increased consistency compared to human experts. These findings support sEEGnal as a robust and scalable solution for automated EEG preprocessing in both research and large-scale applications. HighlightsFully automated and modular EEG preprocessing pipeline. Benchmarked against expert-driven preprocessing. Comparable performance in metadata and EEG-derived measures. Demonstrates stable performance in test-retest recordings. BIDS-based framework for reproducible EEG data handling.
Reisberg, S.; Oja, M.; Mooses, K.; Tamm, S.; Sild, A.; Talvik, H.-A.; Laur, S.; Kolde, R.; Vilo, J.
Show abstract
Background: The increasing availability of routinely collected health data offers new opportunities for population-level research, yet access to comprehensive, linked, and standardised datasets remains limited. We describe EST-Health-30, a large-scale, population-representative health data resource from Estonia. Methods: EST-Health-30 comprises a random 30% sample of the Estonian population (~500,000 individuals), with longitudinal data from 2012 to 2024 and annual updates planned through 2026. Individual-level records are linked across five nationwide databases, including electronic health records, health insurance claims, prescription data, cancer registry, and cause of death records. A privacy-preserving hashing approach ensures consistent cohort inclusion over time while maintaining pseudonymisation. All data are harmonised to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (version 5.4) using international standard vocabularies. Data quality was assessed using established OMOP-based validation frameworks. Results: The dataset contains rich multimodal information on diagnoses, procedures, laboratory measurements, prescriptions, free-text clinical notes, healthcare utilisation, and costs, with high population coverage and longitudinal depth. Data quality assessment showed high completeness and consistency, with 99.2% of applicable checks passing. The age-sex distribution closely reflects the national population, supporting representativeness, though coverage is marginally below the target 30% (29.2%), primarily attributable to recent immigrants without health system contact. The dataset enables construction of detailed clinical cohorts, analysis of disease trajectories, and evaluation of healthcare utilisation and outcomes across the life course. Conclusions: EST-Health-30 is a comprehensive, standardised, and population-representative real-world data resource that supports epidemiological, clinical, and methodological research. Its alignment with the OMOP CDM facilitates reproducible analytics and participation in international federated research networks, while secure access infrastructure ensures compliance with data protection regulations.
Farrell, G.; Attafi, O. A.; Fragkouli, S.-C.; Heredia, I.; Fernandez Tobias, S.; Harrison, M.; Hermjakob, H.; Jeffryes, M.; Obregon Ruiz, M.; Pearce, M.; Pechlivanis, N.; Lopez Garcia, A.; Psomopoulos, F.; Tosatto, S. C. E.
Show abstract
Unprecedented breakthroughs are being made in life science research through the application of artificial intelligence (AI). However, adherence to method reporting guidelines is necessary to support their reusability and reproducibility. The DOME Copilot solution extracts structured reports of AI methods using a large language model to help interpret manuscripts. It is a fast and efficient resource capable of scaling to annotate the corpus of global AI literature, unlocking value and trust in published methods.
Uus, A.; Fukami-Gartner, A.; Kyriakopoulou, V.; Cromb, D.; Morgan, T.; Arulkumaran, S.; Egloff Collado, A.; Luis, A.; Bos, R.; Makropoulos, A.; Schuh, A.; Robinson, E.; Sousa, H.; Deprez, M.; Cordero-Grande, L.; Bradshaw, C.; Colford, K.; Hutter, J.; Price, A.; O'Muircheartaigh, J.; Hammers, A.; Rueckert, D.; Counsell, S.; McAlonan, G.; Arichi, T.; Edwards, A. D.; Hajnal, J. V.; Rutherford, M. A.; Story, L.
Show abstract
Regional volumetric assessment of perinatal brain development is currently limited by the lack of consistent high quality multi-regional segmentation methods applicable to both fetal and neonatal MRI. We present Multi-BOUNTI, a deep learning pipeline for automated multi-lobe segmentation of fetal and neonatal T2w brain MRI. The method is based on a dedicated 43-label parcellation protocol and a 3D Attention U-Net trained on brain MRI datasets of subjects spanning 21-44 weeks gestational/postmenstrual age. The pipeline integrates preprocessing, segmentation and volumetric analysis, and was evaluated on independent datasets, demonstrating fast (< 10 min/case) and accurate performance with high agreement to manually refined labels. We demonstrate the application of the framework with 267 fetal and 593 neonatal MRI datasets from the developing Human Connectome Project without reported clinically significant brain anomalies to derive normative volumetric growth models across 21-44 weeks GA/PMA. These models were used to characterise developmental trajectories, assess differences between fetal and preterm neonatal cohorts, and analyse longitudinal changes. The resulting normative models were integrated into an automated reporting framework enabling subject-specific volumetric assessment via centiles and z-scores. Multi-BOUNTI provides a unified and scalable approach for perinatal brain segmentation and volumetry, supporting large-scale studies and facilitating future clinical translation. The full pipeline is publicly available at https://github.com/SVRTK/perinatal-brain-mri-analysis.
Chen, Y.-K.; Harker, C. M.; Pham, C. M.; Grundy, L.; Wardill, H. R.; Roach, M. J.; Ryan, F. J.
Show abstract
Shotgun metagenomics has become a cornerstone of microbiome research, yet the complexity of existing workflows remains a major barrier for life scientists without dedicated bioinformatics support. Manual database setup, detailed sample sheet preparation, and management of software dependencies can make routine analysis difficult and time-consuming. Cross-study comparisons are further hampered by inconsistent processing pipelines, database versions, and profiling strategies, limiting reproducibility and the potential for large-scale meta-analyses. We present OpusTaxa, an open-source Snakemake workflow that provides end-to-end processing of short paired-end shotgun metagenomic data with minimal configuration. Users provide either FASTQ files or Sequence Read Archive accessions; OpusTaxa automatically downloads required databases, performs quality control, removes host reads, and executes taxonomic profiling, metagenome assembly, and functional analysis. All analysis modules can be independently toggled, and per-sample outputs are automatically merged into harmonised, cross-sample tables ready for downstream exploration. Across two public datasets, we demonstrate how OpusTaxa can be used to compare consistency across complementary taxonomic profilers and to estimate microbial load in addition to standard metagenomic workflows. AvailabilityOpusTaxa is freely available at https://github.com/yenkaiC/OpusTaxa. Documentation, test data, and example configurations are included in the repository.
Nolte, K.; Baumbach, J.; Kollmannsberger, P.; Sauer, F. G.; Luehken, R.
Show abstract
1. Diptera represent a diverse insect order, including vectors of human and animal pathogens. Their accurate species identification remains a major bottleneck in ecological and epidemiological studies. Morphological identification requires taxonomic expertise, while molecular methods are costly and not universally reliable. Wing geometric morphometrics offers an alternative, but manual landmark annotation is time-consuming and introduces observer bias. 2. We developed ITHILDIN, an automated pipeline for landmark and semilandmark annotation of Diptera wings, combining UNet++ segmentation and an Hourglass landmark prediction model. Using mosquitoes as the primary model system, we extended an existing repository with 5,793 additional images. Models were trained on 5991 annotations of landmarks and segmentations and then evaluated on 12,522 images across 34 taxa. We assessed landmark prediction accuracy against human observers and ML-morph, evaluated species identification using Linear Discriminant Analysis on 17 homologous landmarks and 52 semilandmarks, and tested out-of-distribution generalisation by reproducing an independent study. Transferability was demonstrated by adapting the pipeline to the Dipteran families Drosophilidae and Glossinidae. 3. The Hourglass model achieved a mean landmark error of 4.5 pixels (95% CI: 4.3-4.6), within human observer variability (4.7 pixels, 95% CI: 4.4-5.0) and substantially outperforming ML-Morph (12.7 pixels, 95% CI: 11.1-14.2). The semilandmark-based approach for species identification achieved 91% balanced accuracy across 34 taxa, comparable to CNN performance (94%). On out-of-distribution data, the landmark pipeline generalised substantially better than the CNN and a soft-voting ensemble of the landmark and CNN classifiers achieved 88% balanced accuracy on a replicated study. 4. Combining geometric morphometrics with deep learning provides a reproducible, interpretable, and generalisable alternative to black-box CNN classifiers for Diptera wing analysis. By acting as a consistent single observer comparable to human annotation, the system eliminates inter-observer bias, enabling large-scale and cross-study morphometric analyses of Dipteran wings. The system is publicly available at www.ithildin.bnitm.de and transferable to other Diptera families with moderate retraining effort. Data availabilityImages used in this study are accessible under CC BY 4.0 license at https://doi.org/10.6019/S-BIAD1478. Downloadable and installable docker application can be accessed on the applications git page: https://anonymous.4open.science/r/ITHILDIN-4313/
Preston, J. D.; Abadiotakis, H.; Tang, A.; Rust, C. J.; Halkos, M. E.; Daneshmand, M. A.; Chan, J. L.
Show abstract
Clinical research dissemination is frequently hindered by administrative friction and methodological inconsistency. To address these barriers, we developed TernTables, a freely available, open-source web application (https://www.tern-tables.com/) and R package (https://cran.r-project.org/package=TernTables) that streamlines the transition from raw data to formatted results for descriptive and univariate clinical reporting. The system integrates a client-side screening protocol for protected health information (PHI) with a rule-based decision tree that selects and executes appropriate frequency-based, parametric, or non-parametric statistical tests based on data distribution and class. TernTables generates publication-ready summary tables in Microsoft Word format, complemented by dynamically generated methods text and the underlying R code to ensure complete transparency and reproducibility. Validation using a landmark clinical trial dataset demonstrated concordance with established biostatistical approaches for descriptive and univariate analyses. TernTables is designed to supplement, not replace, formal statistical consultation by standardizing routine descriptive and univariate workflows, allowing biostatistical expertise to be focused on complex analyses and study design. By lowering technical and financial barriers, the platform democratizes access to rigorous statistical workflows while maintaining methodological excellence and reducing "researcher degrees of freedom."
Ivanov, V.; Uludag, K. O.; Schöneberg, Y.; Schneider, J. M.; Kennedy, S.; Hamadou, A. B.; Vink, C. J.; Krehenwinkel, H.
Show abstract
Widow spiders of the genus Latrodectus are important animals for biomedical, pest and conservation research. Here, we present the assembled genomes of two closely related Latrodectus species: the Australian L. hasselti and the New Zealand endemic L. katipo. The genome of L. katipo consists of 13 scaffolds likely corresponding to chromosomes (90% of the total length) and 1267 short scaffolds (10%). It has a total length of 1.5 Gbp and BUSCO of 94.9%. The genome of L. hasselti consists of 379 scaffolds and has a total length of 1.7 Gbp and a BUSCO score of 95.4%. The repeat content is very similar in both genomes with a total proportion of 37.2% for L. katipo and 39.9% for L. hasselti. Genome annotation predicted 12706 and 15111 genes for L. katipo and L. hasselti respectively. An ortholog analysis shows large overlap between orthogroups suggesting either duplication events in L. hasselti or loss of genes in L. katipo.
Chang, H.-h.; Cardan, R.; Nedunoori, R.; Fiveash, J.; Popple, R.; Bodduluri, S.; Stanley, D. N.; Harms, J.; Cardenas, C.
Show abstract
Optimizing radiotherapy dose distributions remain a resource-intensive bottleneck. Existing AI-based dose prediction methods often have limited generalizability because they rely on small, heterogeneous datasets. We present nnDoseNetv2, an auto-configured, end-to-end framework for dose prediction across diverse disease sites (head and neck, prostate, breast, and lung), prescription levels (1.5-84 Gy), and treatment modalities (IMRT, VMAT, and 3D-CRT). By integrating machine-specific beam geometry with 3D structural information, the framework is designed to generalize across varied clinical scenarios. A single multi-site model was trained on 1,000 clinical plans. On sites seen during training, performance was comparable to specialized site-specific models. On unseen sites (liver and whole brain), the model outperformed site-specific models, with mean absolute errors of 2.46% and 6.97% of prescription, respectively. These results suggest that geometric awareness can bridge disparate anatomical domains while eliminating the need for site-specific model maintenance, providing a scalable and high-fidelity approach for personalized radiotherapy planning.
Ramm, K.; Brown, C.; Arneth, A.; Rounsevell, M.
Show abstract
We present a spatially explicit, global-scale index to assess the effects of the five direct anthropogenic drivers of biodiversity loss identified by the IPBES: land use change, natural resource extraction, climate change, pollution, and invasive alien species. The Biodiversity Pressure Index (BPI) covers 30 years (1990-2020) with an annual time-step and a spatial resolution of 0.1{degrees}. We find that the coverage of drivers in available data varies and we highlight the key uncertainties that result from this. Using the best available data, we show that large parts of the terrestrial biosphere (approximately 89%, including Antarctica and Greenland) are under medium or high human pressure and that almost all areas (approximately 96%) have experienced an increase in pressure over the past three decades. The BPI shows varied spatial and temporal patterns across world regions and biomes, but many of these areas are dominated by pressures associated with rising temperatures and trade flows. Tropical and subtropical areas are subject to particularly rapidly-growing pressures, while wetlands consistently show the highest pressure levels across biomes. In revealing these and other patterns, the BPI provides a basis for improved understanding and management of biodiversity impacts in the future.
Gazquez, J.; Camacho Cadena, C.; He, W.; Yamada, E.; Altekoester, C.; Soyka, F.; Laakso, I.; Hirata, A.; Joseph, W.; Tarnaud, T.; Tanghe, E.
Show abstract
International guidelines for low-frequency electromagnetic field exposure (LF EMF) are primarily intended to prevent substantiated adverse effects. In the frameworks, limits on internal electric fields are linked to external exposure levels through computational dosimetry. However, the relationship between internal electric fields and these adverse effects remains incompletely understood. In particular, current approaches often overlook the morphological complexity and diversity of cortical neurons, which may limit the realism of neuronal activation estimates used to support these assessments. This study evaluates LF EMF-induced neural activation using 25 morphologically realistic neuron models spanning all cortical layers, embedded within 11 detailed human head models. The internal electric fields were simulated for uniform magnetic field exposures (100 Hz-100 kHz) along the three anatomical directions, and excitation thresholds were computed using a multi-scale framework combining voxel-based dosimetry with biophysical neuron simulations. A real-world exposure scenario involving a child near an acousto-magnetic article-surveillance deactivator was also analyzed. Thresholds varied across cell type, morphology, cortical location, subject anatomy, frequency, and exposure direction, with L2/3 pyramidal, L4 basket, and L5 thick-tufted pyramidal cells showing the lowest thresholds. Despite this variability, all simulated thresholds were conservative with respect to the basic restrictions and dosimetric reference limits set by IEEE ICES and ICNIRP. The smallest margin occurred at 100 kHz, where the threshold remained a factor of 2.8 above the corresponding limit. These findings indicate that current LF EMF exposure limits remain conservative when evaluated using highly detailed, morphology-based CNS activation models.
Dill, R.; Amakhobe, T.; Oballa, G.; Ojenge, G.; Adibe, F.; Peng, J.; Okoth, S.; Osano, A.
Show abstract
Endophytic fungi residing within medicinal plants are emerging as prolific sources of structurally diverse bioactive secondary metabolites with applications in drug discovery. Azadirachta indica (Neem) and Melia azedarach (Melia), members of the Meliaceae family, are renowned for their rich phytochemical composition; however, the contribution of their endophytic fungi communities to this chemical diversity remains largely unexplored. Herein, endophytic fungi were isolated from leaves and bark of Neem and Melia collected in Kenya and cultured under distinct physical conditions, solid (plates) and liquid (broth) media to assess how culture environment influences compound production. Compounds were extracted and analyzed using gas chromatography-mass spectrometry (GCMS) to profile the chemical diversity associated with each endophytic fungi, physical culturing state and host plant. GCMS analysis revealed that while the host plant identity influences the presence of specific compounds, the dominant determinant of chemical diversity was intrinsic biosynthetic capacity of the endophytic fungi themselves. Several compounds were unique to endophytic fungi cultures, highlighting their role as independent sources of bioactive compounds. Culture conditions moderately influence metabolite profiles, demonstrating the importance of optimizing growth environments in experimental design and natural product bioprospecting. From the Neem samples, we found 53 compounds uniquely present in the broth samples (consisting of Neem powder and endophytic fungi), 22 found exclusively with the endophytic fungi from the Neem, and 31 compounds shared between the broth and the endophytic fungi samples. In Melia samples, 109 compounds were uniquely present in broth samples from Melia plant (consisting of Melia powder and endophytic fungi), 22 compounds were found exclusively with the endophytic fungi from the Melia, and 55 were shared between the broth and the endophytic fungi samples. Our comparative analysis assessed the Neem and Melia endophytic fungi exclusive samples and reported 12 shared compounds. 10 compounds were unique to Neem and 10 unique to Melia; however, their identities varied between the two categories. While GCMS enabled the identification of volatile and semi-volatile metabolites, future studies employing complementary metabolomic approaches, such as liquid chromatography-mass spectrometry (LCMS), ultra-high-performance liquid chromatography MS/MS (UHPLC MS/MS), or nuclear magnetic resonance (NMR) spectroscopy, would expand coverage to non-volatile, polar, and high molecular weight compounds, providing a more comprehensive understanding of endophyte-derived chemical diversity. These findings provide insights into the interplay between medicinal plants and their endophytes and establish a foundation for leveraging endophytic fungi from Neem and Melia as scalable sources of structurally complex natural products for pharmaceutical and biotechnological applications while minimizing ecological impact.
Wolters, F. C.; Woldu Semere, T.; Schranz, M. E.; Medema, M. H.; Bouwmeester, K.; van der Hooft, J. J. J.
Show abstract
O_LIPlants produce diverse bouquets of specialized metabolites (SMs), yet only a fraction of the vast phytochemical space has been explored to date. Comparative analysis of SM profiles can reveal hotspots of biochemical novelty, while systematic profiling across taxonomic levels does presently not cover large plant families. C_LIO_LITo study core and accessory SM profiles in the Brassicaceae plant family, we fingerprinted 14 species by Liquid-Chromatography Mass-Spectrometry (LCMS/MS). We develop standardized experimental and computational workflows integrating in silico annotation tools to study consensus compound class and substructure distributions of SMs. Furthermore, we investigate the congruence of chemotaxonomy and species phylogeny across an extended panel of 17 species. C_LIO_LIUnique metabolite profiles were outstanding in Camelina sativa, Capsella rubella, and B. vulgaris, with the largest unique terpenoid profile annotated in C. sativa, accounting for 33.5% and 55.6% in positive and negative ionization mode, respectively. Substructure motifs were found to overlap with compound class predictions, highlighted for triterpenoids in Camelinodae. Furthermore, dual-tissue chemotaxonomic clustering resembled relationships of Brassica subgenomes across tissues. C_LIO_LIWe anticipate that our systematic approach can serve as a blueprint for investigating biochemical diversity in other plant lineages and can boost the characterization of plant natural product pathways. C_LI
Brito Pacheco, D.; Giannopoulos, P.; Reyes-Aldasoro, C. C.
Show abstract
This paper investigates the way in which mitochondria distribute and align inside HeLa cells observed with serial block-face scanning electron microscopy. Four models of alignment were considered: (1) mitochondria exhibiting no discernible alignment pattern, (2) mitochondria aligned pointing towards the nucleus of the cell, (3) mitochondria aligned all in one direction when viewed from above, (4) mitochondria aligned tangent to the surface of the nucleus. These models were named (1) unaligned, (2) petals, (3) racecars, and (4) clouds. The mitochondria, nucleus and plasma membrane of 25 individual cells were segmented. A total of 12,299 mitochondria were identified and analysed. Alignment of the major axis of each mitochondrion was calculated in two ways: relative to a ray that joins it to the centroid of the nucleus, and relative to a ray that joins it to the nucleus surface. Results indicate that mitochondria tend to align tangentially to the nucleus surface, i.e., a clouds model. In addition, differences in the spatial distributions of the mitochondria were found and quantified with clearly defined metrics. The methodology here presented can be extended to other acquisition settings where the distribution and alignment of cells could be important, for instance, histopathology.
German Mesner, I.; Lake, D. E.; Kausch, S. L.; Krahn, K. N.; Gummadi, A.; Clark, T. W.; Niestroy, J. C.; Sahni, R.; Vesoulis, Z. A.; Gootenberg, D. B.; Ambalavanan, N.; Travers, C. P.; Fairchild, K. D.; Sullivan, B. A.
Show abstract
Premature very low birth weight (VLBW) infants have high rates of mortality and morbidity from sepsis, necrotizing enterocolitis, and respiratory failure requiring intubation and mechanical ventilation. Earlier detection of cardiorespiratory deterioration using vital signs from continuous physiological monitoring may lead to more timely interventions and improved outcomes. To further this research area, we present PreMo, a publicly available dataset of continuous heart rate and oxygen saturation, demographics, clinical events, and outcomes for 3,829 VLBW patients from four Neonatal Intensive Care Units (NICUs) in the United States. The PreMo dataset consists of a collection of parquet files, RO-Crate metadata, and sample usage code scripts hosted on the University of Virginia LibraData Dataverse website.